class: center, middle, inverse, title-slide # ISA 401: Business Intelligence & Data Visualization ## 01: Introduction to BI and Data Viz ###
Fadel M. Megahed, PhD
Associate Professor
Department of Information Systems and Analytics
Farmer School of Business
Miami University
Twitter:
FadelMegahed
GitHub:
fmegahed
Email:
fmegahed@miamioh.edu
Office Hours:
Automated Scheduler for Virtual Office Hours
### Spring 2022 --- # Learning Objectives for Today's Class - Describe **course objectives** and **structure**. - Define **data visualization** and describe its **main goals**. - Describe the **BI methodology** and its **major concepts**. --- class: inverse, center, middle # Course Design, Expectations, and Overview --- # The Analytics Journey: Pre-Analytics [1] - **Pre-Analytics/Data Management:** where one attempts to **extract** the needed *data* for analysis. Data can either be: .div[ .pull-left[ ## .center[.large[.large[.large[🥫]]]] * Stale, uninteresting, convenient * Highly processed and archived * Example: `iris`, `mtcars`, `titanic` ] .pull-right[ ## .center[.large[.large[.large[🍅]]]] * Fresh, interesting, challenging * Impactful and sometimes locally collected * Examples: [Cincinnati Open Data Portal](https://data.cincinnati-oh.gov/), [Ohio Data Portal](https://data.ohio.gov/wps/portal/gov/data/), [US Government's Open Data](https://www.data.gov/). ] ] .footnote[ <html> <hr> </html> **Footnotes:** - While the highly processed data can be useful in learning basic concepts, **real-world (often messy)** data real are much interesting to work with -- **e.g., we can make useful & meaningful decision from the data.** In this class, we will learn how to scrape, extract and clean messy data in addition to visualizing clean[ed] data. - Source: Slide inspired by [Kia Ora's What I mean by "data"](https://stats220.earo.me/01-intro.html#6). ] --- # The Analytics Journey: Pre-Analytics [2] ### Non-Graded Class Activity # 1
05
:
00
> _Take 5 minutes to discuss with your partner_ .panelset[ .panel[.panel-name[Activity] - Go to <https://data.cincinnati-oh.gov/Safety/Traffic-Crash-Reports-CPD-/rvmt-pkmq/data> - Download the data utilizing the export column and answer the following questions: * How many **observations/rows** and **columns** do we have in the dataset? * How many **crashes** are reported in the dataset? ] .panel[.panel-name[Your Solution] - Insert your solution here, **which you can do by capitalizing on the pencil icon on the top right of the screen**. This will likely only work if you are viewing this file outside of Canvas. ] .panel[.panel-name[Fadel's Approach (No Solution Shown)] ```r if(require(tidyverse) == FALSE) install.packages("tidyverse") # Link obtained from site -> Export -> "Right Click on" CSV crashes = readr::read_csv("https://data.cincinnati-oh.gov/api/views/rvmt-pkmq/rows.csv?accessType=DOWNLOAD") # Number of rows and columns nrow(crashes) ncol(crashes) # Or alternatively dim(crashes) # Total number of crashes # Will be discussed in class in greater detail ``` ] ] --- # The Analytics Journey: Descriptive [1] - **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. ### Descriptive Statistics for 2 Categorical Variables ``` ## $dayofweek ## ## FRI SAT TUE MON WED SUN THU ## 5036 3735 4199 4093 4264 3291 4592 ## ## $weather ## ## 1 - CLEAR 4 - RAIN ## 19902 3771 ## 2 - CLOUDY 99 - OTHER/UNKNOWN ## 4550 267 ## 6 - SNOW 3 - FOG, SMOG, SMOKE ## 638 18 ## 8 - BLOWING SAND, SOIL, DIRT, SNOW 9 - FREEZING RAIN OR FREEZING DRIZZLE ## 2 20 ## 5 - SLEET, HAIL 7 - SEVERE CROSSWINDS ## 41 2 ``` --- # The Analytics Journey: Descriptive [2] - **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. ### A Simple Visualization - A Bar Chart of Crashes Per Day <img src="data:image/png;base64,#01_Introduction_files/figure-html/viz-1.png" style="display: block; margin: auto;" /> --- # The Analytics Journey: Descriptive [3] - **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**. <img src="data:image/png;base64,#01_Introduction_files/figure-html/viz2a-1.png" style="display: block; margin: auto;" /> --- # The Analytics Journey: Descriptive [4] - **Descriptive Analytics:** where one attempts to **understand** the data through **descriptive statistics** and **visualizations**.  --- # The Analytics Journey: Descriptive [5]
03
:
00
.panelset[ .panel[.panel-name[Activity] - How do the previous two graphs complement each other? - If you were to pick one of the two charts, which one is more informative? - You will be asked to write down your answers on <www.menti.com> in the next two panels. ] .panel[.panel-name[Q1 Solution] <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 6px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/embed/f00b3517a24d41dfb925639f375501d0/7187998e2fd8' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> ] .panel[.panel-name[Q2 Solution] <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 6px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/embed/9ad9847d9714ef4c50e9e5cd1ab83ad2/b258ac9f6e85' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> ] ] --- --- # The Analytics Journey: Predictive [1] - **Predictive Analytics:** where **statistical** and **machine learning** models are used to help us utilize indepedent variable[s] to predict an outcome variable of choice (which can be binary/dichotomous, multinomial/multi-class, or continous). * From my teaching/research/consulting experience, **many** consider this component to be the 🍰 aspect of the analytics journey. * That being said, your success in this stage is **hinged on having**: + **Correct** ✅ data, i.e., - *Do you actually capture the important predictors as potential independent variables?* - *Is your data aggregated to the right level?* + **Cleaned** 🛀 data, i.e., - *Is your data tidy?* - *Is your data technically correct?* - *Is your data consistent?* * With the above constraints/setup, now you can explore how to model the data using statistical and machine learning models? **Some recommendations:** + Start with the simplest (which is also often the most easy-to-explain) model first. + If you are happy with the predictive performance (i.e., no gains would be of practical benefit), you are done 👏. + If not, ↩️ and try other models. --- # The Analytics Journey: Predictive [2] - **Predictive Analytics:** where **statistical** and **machine learning** models are used to help us utilize indepedent variable[s] to predict an outcome variable of choice (which can be binary/dichotomous, multinomial/multi-class, or continous). **For two transportation safety examples, see:** <img src="data:image/png;base64,#figures/cai_2021.png" width="50%" /><img src="data:image/png;base64,#figures/mehdizadeh_2021.png" width="50%" /> --- # The Analytics Journey: Prescriptive [1] - **Prescriptive Analytics:** where **mathematical models** are used to make recommendations for business actions. - Recall that our **overarching goal** behind data/business analytics, is to **make informed decisions based on what we have learned from the data**. Hence, this stage is where we build on what we learned during the *descriptive* and *predictive* stages to make more informed decisions. - Now imagine that you are a large trucking company (e.g., Amazon, Fedex, JB Hunt, etc), and you have models that show **both**: * The factors that your on-board sensors capture driving safety critical events that are associated with crashes. * You have a reasonable model that helps you predict the occurrence of safety critical events as a function of: + Driver characteristics + Weather conditions + Traffic conditions - **As a business analyst, what two reasonable questions would you attempt to approach/optimize for?** --- # The Analytics Journey: Prescriptive [2] - **Prescriptive Analytics:** where **mathematical models** are used to make recommendations for business actions. ### Non-Graded Class Activity # 3
03
:
00
> _Take 3 minutes to formulate the two best questions with your partner_ .panelset[ .panel[.panel-name[Activity] **As a business analyst, what two reasonable questions would you attempt to approach/optimize for?** ] .panel[.panel-name[Your Solution] - Insert your solution here, **which you can do by capitalizing on the pencil icon on the top right of the screen**. This will likely only work if you are viewing this file outside of Canvas. ] .panel[.panel-name[Our Work in this Area] <img src="data:image/png;base64,#figures/gary_nashville_ranking.PNG" title="k-SP solution to feasible routes between Gary, IN and Nashville, TN" alt="k-SP solution to feasible routes between Gary, IN and Nashville, TN" width="450px" style="display: block; margin: auto;" /> ] ] --- # How does our Curriculum at Miami University Prepare you for this Journey? <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/ba_flow_chart.png" alt="Fadel's take on our ISA curriculum" width="100%" /> <p class="caption">My take on the courses within the business analytics major/minor at Miami University</p> </div> --- # ISA 401/501 Course: An Overview <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/course_overview.png" alt="How the ISA 401/501 course is organized." width="100%" /> <p class="caption">How the ISA 401/501 course is organized.</p> </div> --- # ISA 401/501 Course Objectives Even though software will be extensively used, this is not a software class. **Instead, the focus is on understanding the underlying methods and mindset of how data should be approached.** - Be capable of extracting, transforming and loading (ETL) data using multiple platforms (e.g. R, Power BI and/or Tableau). - Write basic R scripts to preprocess and clean the data. - Explore the data using visualization approaches that are based on sound human factors. - Understand how statistical/machine learning can capitalize on the insights generated from the data visualization process. - Create interactive dashboards that can be used for business decision making, reporting and/or performance management. - Be able to apply the skills from this class in your future career. --- # Should you Care? Indeed [1] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/dataClean.jpg" alt="Entry-Level Data Wrangling Jobs on Indeed.com" width="50%" /> <p class="caption">Entry-Level Data Wrangling Jobs on Indeed.com</p> </div> --- # Should you Care? Indeed [2] <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/dataViz.jpg" alt="Entry-Level Data Visualization Jobs on Indeed.com" width="50%" /> <p class="caption">Entry-Level Data Visualization Jobs on Indeed.com</p> </div> --- # Should you Care? Read this Job Ad When I have designed this course, I have incorporated a lot of feedback from **industry collaborators, peer/leading academic programs, and state-of-the-art-research advancements.** Thus, this is meant to be a hands-on, practically-relevant course. ### Non-Graded Class Activity # 4
04
:
00
> _Take 4 minutes to pinpoint the **skills and qualifications that you have prior to taking this class**, and **document what you will learn in this course to make you more competitive**._ .panelset[ .panel[.panel-name[Activity] To demonstrate the practicality of this course, let us consider [this job ad](https://www.indeed.com/viewjob?jk=77f4cf1687882e41&tk=1eff1eg1op7cg800&from=serp&vjs=3). - Please open the Data Scientist (6257U) - CED Data Scientist position at UC - Berkeley by clicking [here](https://www.indeed.com/viewjob?jk=77f4cf1687882e41&tk=1eff1eg1op7cg800&from=serp&vjs=3). - Compare the **responsibilities** and the **required qualifications** with the course objectives. - Read through the required qualifications. - **Document what you will learn in this course to make you more competitive.** ] .panel[.panel-name[Documentation Space] ] ] --- # Should you Care? Recent Alumni Testimonials <img src="data:image/png;base64,#figures/email1.jpg" width="100%" style="display: block; margin: auto;" /><br> <br><img src="data:image/png;base64,#figures/email2.jpg" width="100%" style="display: block; margin: auto;" /> --- # Instructional Approach <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/instructional_approach.png" alt="An overview of the instructional approach for ISA 401/501." width="100%" /> <p class="caption">An overview of the instructional approach for ISA 401/501.</p> </div> --- # How will I Evaluate your Learning? <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/evaluation.png" alt="An overview of the evaluation components for ISA 401/501." width="100%" /> <p class="caption">An overview of the evaluation components for ISA 401/501.</p> </div> --- class: inverse, center, middle # Introductions: Getting to Know Each Other --- # About Me – My route to Miami University - Application of data-driven decisions (D3) in 3 continents. - **Interests:** Applications in logistics, manufacturing, occupational safety & portfolios. - **Collaborations with:** Aflac, GE Research, IBM Research, JB Hunt, & Tennibot <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/map.JPG" alt="My journey with data driven decisions." width="100%" /> <p class="caption">My journey with data-driven decision making.</p> </div> --- # An Overview of My Research Portfolio <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#01_Introduction_files/figure-html/research_megahed3-1.png" alt="My work can be grouped into four different clusters" width="100%" /> <p class="caption">My work can be grouped into four different clusters.</p> </div> --- # Your Academic/Professional Experience <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 6px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/embed/f7eacf99768cd22270c18982864157f2/ea0cb318b3e0' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> --- # Getting to Know Your Learning Objectives <html> <div style='position: relative; padding-bottom: 56.25%; padding-top: 6px; height: 0; overflow: hidden;'><iframe sandbox='allow-scripts allow-same-origin allow-presentation' allowfullscreen='true' allowtransparency='true' frameborder='0' height='315' src='https://www.mentimeter.com/embed/3d85154a0564812d34da0bb42e349428/6c73bccc31d3' style='position: absolute; top: 0; left: 0; width: 100%; height: 100%;' width='420'></iframe></div> </html> --- class: inverse, center, middle # So What is Data Visualization? --- # What is Data Visualization? Data visualization involves **presenting data in a graphical format**. It is really a process that starts by getting data, creating initial plot(s) and modifying them to answer questions of interest (and possibly making the plot aesthetically pleasing). For example, see [Cedric Scherer's visualization of the UNESCO data on global student to teacher ratios](https://www.cedricscherer.com/2019/05/17/the-evolution-of-a-ggplot-ep.-1/). <img src="data:image/png;base64,#https://d33wubrfki0l68.cloudfront.net/1e7033393a2c70dc1255c5d0f1c563e945519251/61035/img/evol-ggplot/evol-ggplot-1.gif" width="58%" style="display: block; margin: auto;" /> --- # The Goals of Data Visualization - **Record** information - **Analyze** data to support reasoning * Develop and assess hypotheses (EDA) * Reveal patterns * Discover errors in data - **Communicate** ideas to others * Infographics * Statistic charts * Interactive charts * Dashboards - **Interact with the data (which supports all the above)** --- # Record: My Great Grandparents <img src="data:image/png;base64,#figures/egypt.jpg" width="90%" style="display: block; margin: auto;" /> --- # Record: A More Modern Example <html> <center> <blockquote class="twitter-tweet"><p lang="en" dir="ltr">I'm a sucker for clean tables. Last week, I used <a href="https://twitter.com/hashtag/RStats?src=hash&ref_src=twsrc%5Etfw">#RStats</a> and gtExtra magic to summarize by Peloton data.<br><br>This week, I couldn't resist taking reactablefmtr for a test drive too. <a href="https://twitter.com/kc_analytics?ref_src=twsrc%5Etfw">@kc_analytics</a>, this package is beautiful!<br><br>🔗: <a href="https://t.co/9KZHjRsJFM">https://t.co/9KZHjRsJFM</a> <a href="https://t.co/Z18ddDM9SR">pic.twitter.com/Z18ddDM9SR</a></p>— Tanya Shapiro (@tanya_shapiro) <a href="https://twitter.com/tanya_shapiro/status/1480648097533509640?ref_src=twsrc%5Etfw">January 10, 2022</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> </center> </html> --- # Analyze Data <img src="data:image/png;base64,#01_Introduction_files/figure-html/cincy_crashes-1.png" style="display: block; margin: auto;" /> --- # Reveal Patterns: The 1854 Cholera Outbreak <div class="figure" style="text-align: center"> <img src="data:image/png;base64,#figures/cholera.jpg" alt="The physician John Snow, dealing with a Cholera outbreak plotted the cases on a map of the city (see schematic above)." width="35%" /> <p class="caption">The physician John Snow, dealing with a Cholera outbreak plotted the cases on a map of the city (see schematic above).</p> </div> .footnote[ <html> <hr> </html> **Credits:** - Source: Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of Massive Data Sets (Third Edition). Cambridge University Press. Image is from Chapter 1, which can be accessed [here](http://infolab.stanford.edu/~ullman/mmds/ch1n.pdf). ] --- # Reveal Patterns: COVID-19 Vaccination Rates <img src="data:image/png;base64,#figures/animatedVaccineMap.gif" width="70%" style="display: block; margin: auto;" /> --- # Communicate Ideas: C.J Minard 1869 <img src="data:image/png;base64,#figures/minard.png" width="100%" style="display: block; margin: auto;" /> --- # Communicate Ideas
05
:
00
.panelset[ .panel[.panel-name[Activity] .pull-left[ ### Non-Graded Class Activity #5 > _Take 5 minutes to discuss this visualization from the WA Post with a colleague_ - Who is the target audience? - What is the data represented in this visualization? Be Specific. - How is the data visually encoded? - Do you like/dislike this visualization? Why? - Would you do visualization like this for a similar dataset? Why? Why not? ] .pull-right[ <img src="data:image/png;base64,#figures/wpost.jpg" width="77%" style="display: block; margin: auto;" /> ] ] .panel[.panel-name[Your Solution] ] ] --- # Interact <html> <div style="max-width:854px"><div style="position:relative;height:0;padding-bottom:56.25%"><iframe src="https://embed.ted.com/talks/lang/en/hans_rosling_the_best_stats_you_ve_ever_seen" width="854" height="480" style="position:absolute;left:0;top:0;width:100%;height:100%" frameborder="0" scrolling="no" allowfullscreen></iframe></div></div> </html> --- class: inverse, center, middle # Business Intelligence: From Visualizations to Dashboards to Insights --- # What is Business Intelligence? "... to enable **interactive access (sometimes in real time)** to data, to enable manipulation of data, and to give business managers and analysts the ability to conduct appropriate analysis. By analyzing ... data, situations, and performances, decision makers get valuable insights that enable them to **make more informed and better decisions** ... BI is based on the **transformation of data to information, then to decisions, and finally to actions.**" <img src="data:image/png;base64,#figures/stock_market.JPG" title="A schematic of an interactive BI tool for stock market prediction" alt="A schematic of an interactive BI tool for stock market prediction" width="55%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> - **Quote** from Sharda, R., Delen, D., & Turban, E. (2013). Business Intelligence: A managerial perspective on analytics. Prentice Hall Press. - **Image Credit:** Joint work with Bin Weng. ] --- # The BI Process <img src="data:image/png;base64,#figures/bi_process.jpg" title="A schematic of the different components of the business intelligence (BI) process" alt="A schematic of the different components of the business intelligence (BI) process" width="73%" style="display: block; margin: auto;" /> .footnote[ <html> <hr> </html> - **Image Credit:** Sharda, R., Delen, D., & Turban, E. (2013). Business Intelligence: A managerial perspective on analytics. Prentice Hall Press. ] --- class: inverse, center, middle # Recap --- # Summary of Main Points By now, you should be able to do the following: